animbook: Visualizing changes in performance measures and demographic affiliations using animation

Performance is often reported in quantiles or ranks and often repeatedly provided over time. Between elections, voters often switch party preferences, which produces categorical values over time. Inspired by the animation in the New York Times article “Extensive data show punishing reach of racism for black boys,” we provide new visualization functions to generate animations that display this type of data in an R package called animbook. Functions to pre-process the data and prepare it for the animation are provided. The methods will be suitable generally for data where categorical variables are measured over time, and provide a way to engagingly view changes.

Krisanat Anukarnsakulchularp (Monash University) , Dianne Cook (Monash University)
2023-11-27

1 Introduction

The concept of “zombie companies” began to attract attention when Caballero et al. (2008) reported on their proliferation in Japan. Zombie companies are those with an interest coverage ratio of less than one for a period of more than three years, that is, companies taking space in the market but adding no life to the economy. We would detect their presence by a drop in performance, which is a movement pattern in their relative ranking over time. Generally, studying movement patterns is interesting for many problems, including rocket ship start-ups that rapidly perform well or in politics, to study voters who switch party affiliation between elections. Viewing changes between categories over time is an interesting challenge for visualization.

The New York Times provided a possible solution in the article titled “Extensive data show punishing reach of racism for black boys” (Badger et al. (2018)) to tell the story of how racism appears to inhibit socioeconomic change. This animation is the motivation for the new visualization presented here to be applied generally.

The challenge in producing an animation like that in the New York Times article animation is the transformation of the data and connection with elements of the plot that will be animated. The complexities include allowing the user to choose the number of categories, standardizing distributions, and allowing the user to input pre-computed categorical data. These considerations provide the objective for creating an R package that can generalize the animation to suitably apply to a wide range of data.

The structure of this paper is as follows. The next section explains the animation in The New York Times article and why it is relevant to the problem of studying zombie companies. The next section describes the expected data format. Following this is an explanation of available animation tools and how they are employed for this problem. A section on the functions of the package and how the visualization is designed demonstrates how the data is mapped to the animation. The last two sections illustrate the usage of the package and applications to company performance and changing political allegiance.

2 Explanation of the New York Times visualization

Screenshot of the New York Times animation, which is the motivation for this visualization package.

Figure 1: Screenshot of the New York Times animation, which is the motivation for this visualization package.

The interactive chart featured in the New York Times article (Figure 1) unveils the issue of income disparities between black and white children who were raised in families with comparable income according to the Chetty et al. (2020). This visualization reveals that, compared to white children, black children are more likely to drop down to the lower-income group, given that they both grew up in wealthy families.

In the visualization, each observation is initially classified into one group at the start and potentially transitions into either the same group or a different group. This dynamics visualization constructs questions on the broader use of this visualization to other types of data. The potential use of this visualization on accounting data is to convey a message, as reported by the McGowan et al. (2017), that the concept of zombie companies is not unique to Japan alone. It is also present in the United States, which has a faster metabolize rate (more new listings and exits) relative to Japan.

The political data that exhibits the movement of voters switching party affiliations between elections can be a valuable insight into the behavior of the voters. This data could be extended to incorporate demographic information about the voters, providing analysts with a significant insight into voter behavior. This allows them to consolidate effective campaigns for their political party. This also applies to marketing data, where customers shift their product interest to the competitor, providing the marketing analysts with an understanding of both the company’s products and the overall market.

This animation was developed using the software based on JavaScript, D3.js (Bostock (2012)), and WebGL (Mozilla Foundation (2011)). The D3 JavaScript is one of the most widely known libraries for creating an interactive and dynamic visualization. It enables the designers to bind both the data and graphical elements to the DOM (Document Object Model). On the other hand, WebGL functions as a JavaScript API for rendering interactive 2D and 3D graphics within any compatible web browser without the use of plug-ins. For the animation in this paper, the programming language that will be used for recreating and revising the visualization done by The New York Times articles is R (R Core Team (2021)).

3 Data

The animation expects data with an ID and a time variable, along with a numerical variable (raw form), which is possibly converted to categorical (categorized). The data can be provided in the raw or categorized form and will be processed into the format needed for the animation, where the categorical variable is treated as a quantile and an animation frame variable is created.

Figure 2: The animation expects data with an ID and a time variable, along with a numerical variable (raw form), which is possibly converted to categorical (categorized). The data can be provided in the raw or categorized form and will be processed into the format needed for the animation, where the categorical variable is treated as a quantile and an animation frame variable is created.

In the data structure, there are requirements that must be followed for reproducing the animation. First, the data set needs to be in the tidy data format (Wickham (2014)). The data then must have at least ID and time variables, in addition to the measured variables, which would usually be numeric but can also be categorical as well. The ID variable indicates the individual, which is followed over time, such as the company name. There may also be a grouping variable, such as the country where the company is registered.

Figure 2 illustrates the expected format of the data and variables created to prepare it for the animation. We start with the raw data structure. The values are presented in the numerical format, which we call the ‘raw’ form. In most cases, the measured variable will be numerical and require transformation. The second form is categorized data, which involves transforming numerical variables into categories, typically quantiles. This transformation may not be necessary if it is already provided in the categorized format. The last form is animated data, where the frame is assigned to each unique ID.

4 Animation tools

The diagram shows how the animation is done using the successive pictures. When small changes are seen in quick succession, it will appear as if the objects are in motion.

Figure 3: The diagram shows how the animation is done using the successive pictures. When small changes are seen in quick succession, it will appear as if the objects are in motion.

A principle important for designing a useful animation is called persistence of vision (Webster (2005)). When an image disappears, the brain will retain the previous images for a brief period of time. It is this slight period of retention that allows humans to separate sequential images. If this is seen in quick succession, it will appear as if the objects are in motion. This is illustrated in Figure 3.

There are multiple ways to create an animation in the R environment (R Core Team (2021)), including the packages gganimate (Pedersen and Robinson (2020)) and plotly (Sievert (2020)).

The gganimate package is an extension from the ggplot2 package (Wickham (2016)) to include the description of an animation. It added new grammar classes to the plot object, allowing it to understand how the plot should change over time. The use of transition_*() functions allows it to achieve this by specifying how the data evolves and how it relates to itself across time. This includes gifski_renderer() from the gifski package (Ooms (2023b)) to save animation in GIF format or av_renderer() from the av package (Ooms (2023a)) to save it into a video file format.

The plotly software is a graphic library that provides tools for creating an interactive plot in multiple programming languages, such as R, JavaScript, Python, and Julia. In R, plotly can be accessed through the plotly package, which integrates plotly.js from the JavaScript graphing library. The usage of this library can be from a converting function, ggplotly(), or a standalone function, plot_ly(). The conversion is accomplished by taking the elements from the ggplot object and then redrawing them using the plotly.js.

The diagram shows how the frames were used in the animated plot. Frame one is depicted in blue, and frame two is represented in red. Each blue component is mapped exclusively to the Frame 1 plot, while all the second-frame elements are excluded, and vice versa.

Figure 4: The diagram shows how the frames were used in the animated plot. Frame one is depicted in blue, and frame two is represented in red. Each blue component is mapped exclusively to the Frame 1 plot, while all the second-frame elements are excluded, and vice versa.

In the context of data, as shown in Figure 4, observations are positioned at specific points in time. The further the distance between these points, the less smooth the animation becomes. This issue can be eliminated by interpolating additional points in between the observations. In the gganimate package, the interpolation is achieved using the tweenr package (Pedersen (2022)), while in plotly, it utilizes d3.interpolate (Bostock (2012)).

Figure 4 demonstrates how the frame variables are applied in an animated plot. The frame variable within the animated data structure allows the animation function to determine the position of observations on the plot at any given frame.

From Hasler et al. (2007), it suggests that having control options for the animation can improve the efficiency of the learning process. Additionally, the length and speed of the animation should also be taken into consideration. According to Mayer (2010), the working memory, responsible for selecting and processing information from sensory memory, only holds a processed version of what was presented for generally less than thirty seconds.

In gganimate, the issue of integrating controls can be addressed by setting the renderer argument to be av_renderer(), which allows the animation output to be in media applications provided in their systems. As for adjusting the length and speed of the animation, the nframes and fps arguments can be utilized. The nframes dictates the number of frames to be rendered, while fps controls how many frames are displayed in one second. Using these two parameters, the duration of the animation in seconds can be calculated as follows: length = nframes/fps.

In the case of plotly, control integration is already implemented by default. The frame and transition arguments within the animation_opts() function can be specified to set the length and speed of the animation.

5 Visualization design

The animated visualization can be an effective communication tool (Mayer and Moreno (2002); Robertson et al. (2008)). It helps with communicating changing data values, enhancing the narrative, and keeping it engaging for the audience. According to Mayer and Moreno (2002), animation can improve learning, especially when the goal is to promote deep understanding.

Mayer (2005) states that designing multimedia requires the designer to understand how people learn. One of the principles in Mayer (2005), Redundancy, suggests that a piece of excess information could overload the learners. By this principle, the animation must be carefully designed to avoid this pitfall.

The plot shows the difference between the sigmoid (shown in the left figure) and the sine curve (shown in the right figure). In the sigmoid, as the curve progresses, it becomes narrower, resulting in a less accurate representation of proportions compared to the sine curve.The plot shows the difference between the sigmoid (shown in the left figure) and the sine curve (shown in the right figure). In the sigmoid, as the curve progresses, it becomes narrower, resulting in a less accurate representation of proportions compared to the sine curve.

Figure 5: The plot shows the difference between the sigmoid (shown in the left figure) and the sine curve (shown in the right figure). In the sigmoid, as the curve progresses, it becomes narrower, resulting in a less accurate representation of proportions compared to the sine curve.

In the animation, the proportional shaded area has been incorporated to facilitate a quick grasp of the proportion information. It displays the proportion of observation within each group. The design also needs to account for situations where the visualization lacks an adequate number of data points. In such cases, it can be challenging to visually discern the movement pattern of the subgroup, requiring the implementation of observation interpolation.

The paths of observations need to be interpolated to create small changes in the position of points to produce the appearance of motion, as mentioned in the Animation tools section. These interpolate functions in gganimate and plotly only generate a linear path between points. However, linear paths are visually less appealing than non-linear paths. Sigmoid curves are commonly used in Sankey diagrams (Hvitfeldt (2018)). However, Shaffer (2019) argues that a sine curve more accurately represents proportion throughout the path. This is illustrated in Figure 5. The sigmoid accurately represents the proportion at the beginning and end, but as it curves, the shape gets narrower, leading to a less accurate proportion representation than the sine curve.

6 Software

Installation

The development version of animbook can be installed from GitHub with:

# install.packages("devtools")
devtools::install_github("KrisanatA/animbook")

Overview of functions

In designing the animbook package, a three-step structured approach was developed to create an animation. The initial step is to reformat the data into a categorized data structure, as seen in Figure 2. The second stage is creating a ggplot object, which can be subsequently passed into the animation function. During the second stage, an internal function will turn the categorized data into an animated data structure. The final step involves transforming the ggplot to a gganimate object by integrating the animation settings. This three-step structure was implemented to ensure that users, regardless of their level of experience, can produce the animations with simplicity while retaining customization for more experienced users.

Data preprocessing

From Figure 2, there is a need to map numerical value to a category. One way to handle this is by ranking the sales and grouping the rankings into quantiles. In some cases, this may not be the best option. When the observation is moved up by quantile, one is bound to move down. This issue can be resolved by using an alternative method, which is grouping values based on their absolute values. Users may also be interested in grouping the data based on different groups, for example, ranking within a specific country. This generalization leads to four different scaling methods for the numerical data.

# A tibble: 12 × 8
   id     time gp    values  rank rank_group absolute absolute_group
   <fct> <int> <fct>  <dbl> <int>      <int>    <int>          <int>
 1 1      2020 X      4255.     3          3        3              3
 2 1      2023 X      3357.     2          2        4              4
 3 14     2020 X      6763.     2          2        2              2
 4 14     2023 X      1197.     4          5        5              5
 5 100    2020 X      6864.     2          2        2              2
 6 100    2023 X      2321.     3          3        4              4
 7 21     2020 Y       698.     5          5        5              5
 8 21     2023 Y      3970.     2          2        4              3
 9 106    2020 Y      4110.     3          3        3              3
10 106    2023 Y      2866.     3          3        4              4
11 148    2020 Y      7174.     2          2        2              2
12 148    2023 Y      4217.     1          1        3              3
  1. Ranking by year. (rank)
  2. Ranking by year within a group. (rank_group)
  3. Fix bins relative to absolute values by year. (absolute)
  4. Fix bins relative to absolute values by year within a group. (absolute_group)

For the first and second scaling methods, it is necessary to rank the values based on time, and in cases where a group is provided, they are ranked based on both time and groups. To ensure that the rank scales among different groups are the same, the variables are first normalized to a range between 0 and 1. The third and fourth scaling methods also involve a normalization step, but they are based on raw values instead of rank values.

All of these scaling methods utilized the cut() function from the base R package (R Core Team (2021)) to split the values into quantiles. The cut() function requires the specification of the breaks argument. If it is not provided, the prep function in this package defaults to using the seq() function, which sets the minimum and maximum values to 0 and 1, respectively, and then increments by equal steps based on the number of groups of interest. Additionally, the users have the option to specify the breaks themselves if they choose to do so, noting that the breaks provided need to be between 0 and 1, and the length of the vector needs to be the number of groups plus 1.

All of the pre-processing steps mentioned above are completed using the anim_prep() or anim_prep_cat() function, depending on the stages of the data structure. The anim_prep() function is used for raw data format, while the anim_prep_cat() function is for categorized data format. There are additional arguments that allow users for more customization.

These are only the initial steps in formatting the data into a category. Now that there is a method to transform the data from the raw into a categorized format, the next step is to modify it into an animated data structure. It is carried out by assigning the frame to each individual observation, ensuring that each ID does not contain repeat frame values. It lets the gganimate or plotly perceive where the observation would be on the plot at a given frame, as seen in Figure 4.

The frame variable is assigned by sorting the data based on the ID and time using the arrange() function, followed by applying the group_by() function on the ID, allowing the row_number() function to be performed within each group. The functions mentioned in this paragraph are from the dplyr package (Wickham et al. (2023)). These are done by the internal function for each of the plots provided in the packages.

Table 1: The required argument for the prep function.
Argument Description
data A data frame containing the data to be prepared for visualization.
id The column name that represents the unique identifier variable.
values The column name that contains the numeric values to be visualized.
time The column name represents the time variable.
Table 2: The argument in the anim_prep for customized the data scaling.
Argument Description
ncat The number of groups or categories to create for scaling values.
breaks A vector of breaks for creating bins.
group_scaling The column name that represents the grouping variable.
scaling The scaling method to be used; “rank” or “absolute.”
Table 3: The argument in the anim_prep_cat for customized the order of the category.
Argument Description
order A vector of order for sorting the category values.
Table 4: The additional argument for customizing the aesthetics of the visualization.
Argument Description
group The column name that distinguishes between the values group.
label A vector of labels to be used for the y-axis in the visualization.
time_dependent Logical. Should the visualization be time-dependent? Default is TRUE.

In the anim_prep() and anim_prep_cat(), there are required arguments that need to be specified. The lists are in the Table 1.

As shown previously, there are a total of four different scaling for the anim_prep() function. Both the use can customization of these scaling can be set using the following arguments in Table 2.

For the anim_prep_cat() function, the arguments that allowed the user to order the category are by using this argument in Table 3.

Then, for further customization regarding how the final visualization looks. The following arguments in Table 4 can be adjusted.

Plotting function

Once the data is prepared. The next step is to create the ggplot object as a basis for the animation. There are three plots available in this package. Two of the plots could be used for the animation, and another plot is used as a static visualization. All of the plots have an internal function that converts the categorized data format into the animated structure for each plotting function.

The main focus of this paper will be on the wallaby_plot(), which draws inspiration from the New York Times animation and combines the knowledge gained from the previous sections in creating the visualization.

The wallaby_data() function is responsible for performing data manipulation and formatting tasks on the original object. It includes creating additional data components for labeling and shading. This function also responds by interpolating the non-linear path. It performs this task by mapping the sine() function provided in this package to each observation using the map() function from the purrr package (Wickham and Henry (2023)). The frame is recalculated using the same method mentioned in the data preprocessing section. The proportional_shade() function is called to generate the proportional shaded data.

The plot shows how the algorithm for the proportional\_shade() function works. The left figure represents the initial step of the algorithm, which calculates all the corner points. The right figure demonstrates the subsequent step, where points in-between the left and right are interpolated using the sine() function.The plot shows how the algorithm for the proportional\_shade() function works. The left figure represents the initial step of the algorithm, which calculates all the corner points. The right figure demonstrates the subsequent step, where points in-between the left and right are interpolated using the sine() function.

Figure 6: The plot shows how the algorithm for the proportional_shade() function works. The left figure represents the initial step of the algorithm, which calculates all the corner points. The right figure demonstrates the subsequent step, where points in-between the left and right are interpolated using the sine() function.

The algorithm behind the proportional_shade() function for generating the proportional shaded data is illustrated in Figure 6. It started by calculating the corner points for all of the shaded areas. Then, use the sine() function to interpolate the point between the left and right.

Now that the data are in the right format for the wallaby’s plot, the geom_point() function is used for plotting the observations, geom_polygon() is used for creating the proportional shaded areas, and geom_text() is used for creating labels. These three functions are from the ggplot2 package. During this process, the aesthetics mapping will be different depending on the rendering tool to be used, which can be either gganimate or plotly. The difference between the two rendering tools is that for plotly, the ids and frame arguments need to be specified during the creation of the ggplot object.

Table 5: The arguments for the wallaby_plot function.
Argument Description
data The categorized data returned from the prep function.
group_palette The vector of the palette used by the function to supply the color to each group.
shade_palette The vector of the palette used by the function to supply the color to the shaded area.
rendering The choice of method used to create and display the plot, either gganimate or plotly.
subset A character string specifying the variable used for subsetting the data. The “top” and “bottom” strings can also be used in this argument.
relation The choice of relationship for the values to display on the plot, either “one_many.” or “many_one.”
total_point The number of points the users want for the wallaby plot. Default is NULL, which is the number of points equal to the original.
height The proportion of the area occupied by the observations in the shaded areas.
width The distance between the first and last observation in the animation.
size The point size.
alpha The opacity of the proportional shaded areas.

Animating function

It is necessary to save the plot as a ggplot object before passing it to the final function, anim_animate(), to animate. This function will automatically detect which rendering method was specified in the previous steps and add the minimum requirement functions accordingly. By default, if the user specifies the rendering as gganimate, then it will add the transition_time() function from the gganimate package. Otherwise, the animation_opts() function will be added from the plotly package.

Example usage

In this section, the package usage will be demonstrated using the cat_change dataset included in this package. This dataset is simulated data that has changed from category A to E between two time points. Since this data is already in the categorized data structure, the anim_prep_cat() function is utilized. The output produced from this package is referred to as the categorized data.

animbook <- anim_prep_cat(cat_change, 
                          id = id, 
                          values = qnt, 
                          time = time, 
                          group = gp)

head(animbook, 10)
# A tibble: 10 × 5
   id     time qtile label group
   <fct> <int> <dbl> <chr> <fct>
 1 1      2020     5 A     X    
 2 2      2020     5 A     X    
 3 3      2020     5 A     X    
 4 4      2020     5 A     X    
 5 5      2020     5 A     X    
 6 6      2020     5 A     X    
 7 7      2020     5 A     X    
 8 8      2020     5 A     X    
 9 9      2020     5 A     X    
10 10     2020     5 A     X    
str(animbook)
tibble[,5] (S3: tbl_df/tbl/data.frame/categorized)
 $ id   : Factor w/ 200 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ time : int [1:400] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
 $ qtile: num [1:400] 5 5 5 5 5 5 5 5 5 5 ...
 $ label: chr [1:400] "A" "A" "A" "A" ...
 $ group: Factor w/ 2 levels "X","Y": 1 1 1 1 1 1 1 1 1 1 ...

To obtain a ggplot object for the animation function, the wallaby_plot is used. For further customization, the shade_palette, subset, and relation arguments are applied to define the final appearance of the animation.

p <- wallaby_plot(data = animbook,
             group_palette = RColorBrewer::brewer.pal(9, "Set1"),
             shade_palette = c("#737373", "#969696", "#BDBDBD",
                               "#D9D9D9","#D9D9D9","#D9D9D9"),
             rendering = "gganimate",
             subset = "top",
             relation = "one_many",
             total_point = NULL)

p

Once the ggplot object is obtained, the anim_animate() function can be used to transform it into a gganimate object. This then can be passed onto the animate() function from the gganimate for rendering.

p2 <- anim_animate(p)

gganimate::animate(p2)
Animate visualization using example data. All of the X observations stay within the same group, while Y observations change from group A to group E.

Figure 7: Animate visualization using example data. All of the X observations stay within the same group, while Y observations change from group A to group E.

This is an example of how the users can apply the three-step process in creating an animation plot using the animbook package.

7 Application

Accounting database: osiris

The accounting Osiris data that will be used in this section was collected from Bureau van Dijk (Osiris (2023)). This data set comprises 30,000 rows and 94 variables of information on listed and major unlisted/delisted companies worldwide. The variables of interest from this data set are ID, year, country, and sales. A subset of companies from this data set is used recorded from 2006 to 2018. This sample is included in the package.

An OECD report (McGowan et al. (2017)) claims that the United States has a faster metabolic rate relative to Japanese companies. We use the wallaby plot and kangaroo plot animation to explore this. The steps are:

# library(animbook)
# library(dplyr)

data <- osiris |> 
  filter(country %in% c("US", "JP"))

label <- c("Top 25%", "25-50", "50-75", "75-100", "Not listed")

accounting <- anim_prep(data, 
                      id = ID, 
                      values = sales, 
                      time = year, 
                      label = label, 
                      ncat = 4, 
                      group = country)

kan_p <- kangaroo_plot(accounting)

p <- wallaby_plot(accounting,
                  group_palette = RColorBrewer::brewer.pal(9, "Set1"),
                  shade_palette = c("#737373", "#969696", "#BDBDBD",
                                    "#D9D9D9","#D9D9D9","#D9D9D9"),
                  subset = "bottom",
                  relation = "many_one",
                  height = 1,
                  size = 2,
                  width = 100,
                  total_point = 1000)

kan_p2 <- anim_animate(kan_p)

p2 <- anim_animate(p)
gganimate::animate(kan_p2)

gganimate::animate(p2)
The kangaroo plot visualization shows the movement of the Japanese and US companies between the performance sales quantiles from 2006 to 2018 for a sample of data extracted from the Osiris database. The 'not listed' category indicates companies not yet listed or removed from the listing. Most companies stay in the same quantile group, with a small number moving up and down. Most of the movement is made by American companies, which supports the OECD report.

Figure 8: The kangaroo plot visualization shows the movement of the Japanese and US companies between the performance sales quantiles from 2006 to 2018 for a sample of data extracted from the Osiris database. The ‘not listed’ category indicates companies not yet listed or removed from the listing. Most companies stay in the same quantile group, with a small number moving up and down. Most of the movement is made by American companies, which supports the OECD report.

The wallaby plot visualization shows the companies that exited the market. There are more United States companies that fall down into a not listed group (got de-listed) compared to Japanese companies.

Figure 9: The wallaby plot visualization shows the companies that exited the market. There are more United States companies that fall down into a not listed group (got de-listed) compared to Japanese companies.

From the kangaroo plot (Figure 8), most companies are staying in the same quantile group, with only a small number moving upward or downward. Most of the movement is made by American companies, supporting the OECD report (McGowan et al. (2017)) that the United States has a higher turnover rate compared to Japan. To emphasize this point, the wallaby plot can be utilized (Figure 9). There are clearly more American companies de-listed compared to Japanese companies, and we can see that there are no Japanese companies in the Top 25% in 2006 that were de-listed.

Voter behavior

The Australian Election Study data was collected from ADA Dataverse (McAllister et al. (2023)). The dataset included in the animbook package contains 1,468 rows and 4 variables of information: ID, year, party, and gender. This dataset comprises survey responses from the 2019 election. The year column was transformed from the two different questions to answer how the top party performs in keeping the old voters for different genders relative to the 2016 Australian election results.

voter <- anim_prep_cat(data = aeles,
                       id = id,
                       values = party,
                       time = year,
                       group = gender,
                       order = NULL)

p_voter <- wallaby_plot(data = voter,
                  group_palette = c("pink", "blue", "red"),
                  shade_palette = c("#737373", "#969696", "#BDBDBD",
                                    "#D9D9D9","#D9D9D9","#D9D9D9"),
                  rendering = "gganimate",
                  subset = "top",
                  relation = "one_many",
                  height = 1,
                  size = 2.5,
                  width = 100,
                  total_point = 1000)

p2_voter <- anim_animate(p_voter)
gganimate::animate(p2_voter)
The wallaby plot visualization shows how the top party performs in keeping the old voters of different genders. Most voters remain loyal to the party, but a small fraction of voters with roughly equal male-to-female ratio switch primarily to the other major party. Interestingly, individual who identified as others overwhelmingly shifted their party affiliations to the Greens party.

Figure 10: The wallaby plot visualization shows how the top party performs in keeping the old voters of different genders. Most voters remain loyal to the party, but a small fraction of voters with roughly equal male-to-female ratio switch primarily to the other major party. Interestingly, individual who identified as others overwhelmingly shifted their party affiliations to the Greens party.

From Figure 10, it reveals an intriguing pattern where individuals who identified their gender as ‘others’ have shifted their voting preference from the Liberal Party, the leading party in 2006, to the Greens Party. This change in behavior could be linked to the Green Party’s “A FAIRER, MORE EQUAL COMMUNITY” campaign, which advocates for full equality under the law and communities for LGBTIQ+ individuals.

Liberal Party analysts may find it worthwhile to delve into this topic further, considering both the benefits and drawbacks. If the Liberal Party wishes to retain this demographic of voters, they may need to consider implementing a campaign focused on LGBTIQ+ issues. However, this may also lead to a further defection of conservative voters.

8 Summary

Beginning with inspiration from the New York Times articles, this package provides tools to facilitate the communication of complex data to general audiences. In the current version of animbook, the feature to have different speeds for each observation has not yet been implemented. This feature will introduce a new dimension that allows the users to explore the data further. The plotly rendering is also not yet been optimized for use. Additionally, both new animated and static plots could also be added, as seen in the New York Times article (Badger et al. (2018)), to compare the demographics at a specific point in time. On top of that, the cli package (Csárdi (2023)) could be used for a more aesthetically pleasing command-line interface.

Acknowledgements

This paper is created using the rjtools packages (O’Hara-Wild et al. (2023)) and utilized the dplyr, knitr, gganimate, and kableExtra for creating the visual included in this paper. This paper is based on the 0.0.0.9 version of the animbook package. This version is available on GitHub. I also acknowledge the use of ChatGPT (https://chat.openai.com/) to improve the text, grammar, and spelling.

CRAN packages used

gganimate, plotly, ggplot2, gifski, av, tweenr, dplyr, purrr, cli, rjtools, knitr, kableExtra

CRAN Task Views implied by cited packages

Databases, ModelDeployment, Phylogenetics, ReproducibleResearch, Spatial, TeachingStatistics, WebTechnologies

E. Badger, C. C. Miller, A. Pearce and K. Quealy. Extensive data shows punishing reach of racism for black boys. The New York Times, 2018. URL https://www.nytimes.com/interactive/2018/03/19/upshot/race-class-white-and-black-men.html.
M. Bostock. D3.js - data-driven documents., 2012. URL http://d3js.org/.
R. J. Caballero, T. Hoshi and A. K. Kashyap. Zombie lending and depressed restructuring in Japan. The American Economic Review, 98(5): 1943–1977, 2008. URL http://www.jstor.org/stable/29730158 [online; last accessed October 8, 2023].
R. Chetty, N. Hendren, M. Jones and S. Porter. Race and economic opportunity in the united states: An intergenerational perspective*. The Quarterly Journal of Economics, 135: 711–783, 2020. DOI 10.1093/qje/qjz042.
G. Csárdi. Cli: Helpers for developing command line interfaces. 2023. URL https://CRAN.R-project.org/package=cli. R package version 3.6.1.
B. Hasler, B. Kersten and J. Sweller. Learner control, cognitive load and instructional animation. Applied cognitive psychology, 21, 713-729. Applied Cognitive Psychology, 21: 713–729, 2007. DOI 10.1002/acp.1345.
E. Hvitfeldt. Recreate - sankey flow chart. Recreate - Sankey Flow Chart, 2018. URL https://emilhvitfeldt.com/post/2018-03-20-recreate-sankey-flow-chart/.
R. Mayer. Applying the science of learning to medical education. Medical education, 44: 543–9, 2010. DOI 10.1111/j.1365-2923.2010.03624.x.
R. E. Mayer. Cognitive theory of multimedia learning. In The cambridge handbook of multimedia learning, Ed R. Mayer pages. 31–48 2005. Cambridge University Press. DOI 10.1017/CBO9780511816819.004.
R. E. Mayer and R. Moreno. Educational Psychology Review, 14(1): 87–99, 2002. DOI 10.1023/a:1013184611077.
I. McAllister, J. Sheppard, S. Cameron and J. Simon. Australian election study, 2022. 2023. URL http://dx.doi.org/10.26193/W3U2S3.
M. A. McGowan, D. Andrews and V. Millot. The walking dead? (1372): 2017. URL https://www.oecd-ilibrary.org/content/paper/180d80ad-en.
Mozilla Foundation. WebGL. 2011. URL www.khronos.org/webgl/.
M. O’Hara-Wild, S. Kobakian, H. S. Zhang, D. Cook, S. Urbanek and C. Dervieux. Rjtools: Preparing, checking, and submitting articles to the ’r journal’. 2023. URL https://CRAN.R-project.org/package=rjtools. R package version 1.0.10.
J. Ooms. Av: Working with audio and video in r. 2023a. URL https://CRAN.R-project.org/package=av. R package version 0.8.4.
J. Ooms. Gifski: Highest quality GIF encoder. 2023b. URL https://CRAN.R-project.org/package=gifski. R package version 1.12.0-1.
Osiris. bvd, 2023. URL https://www.bvdinfo.com/en-gb/our-products/data/international/osiris.
T. L. Pedersen. Tweenr: Interpolate data for smooth animations. 2022. URL https://CRAN.R-project.org/package=tweenr. R package version 2.0.2.
T. L. Pedersen and D. Robinson. Gganimate: A grammar of animated graphics. R package version, 1(7): 403–408, 2020.
R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2021. URL https://www.R-project.org/.
G. Robertson, R. Fernandez, D. Fisher, B. Lee and J. Stasko. Effectiveness of animation in trend visualization. IEEE Transactions on Visualization and Computer Graphics, 14(6): 1325–1332, 2008. DOI 10.1109/TVCG.2008.125.
J. Shaffer. Sankey diagrams: Why i used the sigmoid function and why you probably shouldn’t. Data + science, 2019. URL https://www.dataplusscience.com/Sigmoid.html.
C. Sievert. Interactive Web-Based data visualization with r, plotly, and shiny. 2020. URL https://plotly-r.com.
C. Webster. Animation : The mechanics of motion. Oxford ; Burlington, MA: Elsevier Focal Press, 2005.
H. Wickham. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York, 2016. URL https://ggplot2.tidyverse.org.
H. Wickham. Tidy data. Journal of Statistical Software, 59(10): 1–23, 2014. URL https://www.jstatsoft.org/index.php/jss/article/view/v059i10.
H. Wickham, R. François, L. Henry, K. Müller and D. Vaughan. Dplyr: A grammar of data manipulation. 2023. https://dplyr.tidyverse.org, https://github.com/tidyverse/dplyr.
H. Wickham and L. Henry. Purrr: Functional programming tools. 2023. URL https://CRAN.R-project.org/package=purrr. R package version 1.0.2.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Anukarnsakulchularp & Cook, "animbook: Visualizing changes in performance measures and demographic affiliations using animation", The R Journal, 2023

BibTeX citation

@article{animbook-journal,
  author = {Anukarnsakulchularp, Krisanat and Cook, Dianne},
  title = {animbook: Visualizing changes in performance measures and demographic affiliations using animation},
  journal = {The R Journal},
  year = {2023},
  issn = {2073-4859},
  pages = {1}
}